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Introduction 

Coarse-grained (CG) models have proven to be use¬ 
ful in many fields of chemical research,allow¬ 
ing molecular simulation to be performed on larger 
system sizes and access longer timescales than is 
possible with atomistic-level models, enabling com¬ 
plex phenomen a such as hierarchical self-assembly 
to be described.! ^ In CG simulations of aqueous 
systems, especially ones with significant amounts 
of hydrophobic and/or hydrophilic interactions, the 
water model is important and can have a major 
impact on the resulting properties of the system.^ 
While the assignment of atoms to CG beads (i.e., 
defining the CG mapping) is relatively straightfor¬ 
ward for most chemical systems (e.g., aggregating 
four methyl groups bonded in sequence into a sin¬ 
gle CG bead), mapping an atomistic water trajec¬ 
tory to the CG level (i.e., grouping several water 
molecules into a single CG bead) is not as well- 
defined given the lack of permanent bonds between 
water molecules. This ambiguity presents a prob¬ 
lem for structure-based methods that require an 
atomistic configuration to be mapped to the corre¬ 
sponding CG configuration, e.g., to generate target 
radial distribution functions (RDFs) against which 
the forcefield is optimized. As such, the majority 
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of many-to-one CG models of water (i.e., where 
one CG bead represents multiple water molecules) 
have instead been derived by assuming a functional 
form of the forcefield and optimizing the associated 
parameters to match selected physical properties 
of water, such as density, vaporization enthalpy, 
surface tension, etcFor example, Chiu et al. 
developed a 4:1 CG water forcefield by optimizing 
the parameters of a Morse potential to accurately 
reproduce the surface tension and density of liquid 
water.l^ Despite capturing the interfacial properties 
and density, this potential overestimates structural 
correlations, as one might expect given that struc¬ 
tural data was not used in the optimization. 

Recently, Hadley and McCabe^ proposed a 
method for mapping configurations of atomistic 
water to their CG representations using the k- 
means clustering algorithm. Subsequently in re¬ 
lated work, van Hoof et a/.^^ developed the CUMU¬ 
LUS method for mapping atoms to CG beads. Both 
methods enable dynamic mapping of multiple water 
molecules to a single CG bead, allowing structure- 
based schemes to be used. Here, dynamic refers to 
a CG mapping that changes over the course of the 
atomistic trajectory, i.e., different water molecules 
are assigned to different CG beads in each frame 
of the atomistic trajectory. Both works employed 
the iterative Boltzmann inversion (IBI)I^ method to 
derive the intermolecular interactions by optimiz¬ 
ing a numerical, rather than analytical, potential 
to reproduce RDFs calculate d from the atomistic- 
to-CG mapped configurations.The forcefields 
derived are similar and show good agreement with 
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the structural properties and density of the atomistic 
water models studied. However, neither model is 
able to accurately reproduce interfacial properties, 
since they were derived solely from bulk fluid data. 
This failure to capture the interfacial properties is 
a consequence of the single-state nature of the IBI 
approach, and may alter the balance of hydropho¬ 
bic and hydrophilic interactions when using these 
water models in multicomponent systems. 

Recently, the multistate IBI (MS IBI) method 
was developed as an extension of the original IBI 
approach,E21 with the goal of reducing state depen¬ 
dence and structural artifacts often found in IBI- 
based potentials .^231^6]^ significant issue related to 
the IBI method is that a multitude of potentials may 
give rise to similar RDFs, and the method cannot 
necessarily differentiate which of the many poten¬ 
tials is most accurate, as only RDF matching is 
considered. MS IBI operates based on the idea that 
different thermodynamic states will occupy differ¬ 
ent regions of potential “phase space” (i.e., regions 
where potentials give rise to similar RDFs), and 
that the most transferable, and thus most accurate, 
potential lies in the overlap of phase space for the 
different states. That is, by optimizing a potential 
simultaneously against multiple thermodynamic 
states, MS IBI provides constraints to the optimiza¬ 
tion, forcing the method to derive potentials that 
exist in this overlap region, and thus are transfer¬ 
able among the states considered. The MS IBI ap¬ 
proach has been shown to reduce state dependence 
and improve the quality of the derived potentials, 
as compared to the original IBI method.^ 

In this work, multistate iterative Boltzmann inver¬ 
sion (MS IBI) is used to derive an intermolecular 
potential that captures both bulk and interfacial 
properties of water, improving upon the CG wa¬ 
ter model of Hadley and McCabe .1^ Again, opti¬ 
mizations are carried out using the MS IBI method, 
where both bulk and interfacial systems are used si¬ 
multaneously as target conditions for the optimiza¬ 
tion. MS IBI is also used, for the first time, in a 
multi-ensemble context, enabling optimizations in 
both the canonical (NVT) and isothermal-isobaric 
(NPT) ensembles to be performed simultaneously 
to derive the density-pressure relationship of the 
system. To further constrain the optimization, a 
slightly modified version of the Chiu et al. CG 
water forcefield, optimized for surface tension, is 


used as a starting condition, allowing the MS IBI 
method to make specific modifications to the poten¬ 
tial to improve structural properties. The remainder 
of the paper is organized as follows: In Methods, a 
brief overview of the k-means clustering and MS 
IBI algorithms is given and the models used are de¬ 
scribed. The potential derivation is then presented, 
validated, and compared to existing CG water mod¬ 
els in the Results section and finally, conclusions 
are drawn about the applicability of the derived CG 
model and the broader applicability of the MS IBI 
method discussed. 

Methods 

^-means Clustering Algorithm 

Mapping a water trajectory to the CG level is in¬ 
herently different than mapping a larger molecule’s 
trajectory, since for water, atoms mapped to a sin¬ 
gle CG bead exist on different molecules. Further¬ 
more, the water molecules mapped to a common 
bead are not likely to remain associated throughout 
the full simulation because of thermal diffusion. 
A dynamic mapping scheme is therefore required 
for water. Following the work of Hadley and Mc¬ 
Cabe,!^ the k-means algorithm is used to map the 
atomistic water trajectory to the CG level. In short, 
k-means is a clustering algorithm that is used to 
find clusters of data points in a large data set. The 
positions of the water molecules are here analogous 
to the points in the data set and waters mapped to a 
single bead are analogous to the clusters. Mo re de- 
tails of the algorithm can be found elsewhere.^^^^^ 
While the k-means algorithm can be used to group 
together any number of water molecules, a 4:1 map¬ 
ping is chosen, as this was found in prior to provide 
the best balance between accuracy and computa¬ 
tional efficie ncy,^ a nd 4:1 models are common in 

the literature 

Multistate Iterative Boltzmann Inver¬ 
sion Method 

MS IBI was used to derive the intermolecular po¬ 
tential between water beads. The goal of MS IBI is 
to derive a single potential that can be used over a 
range of thermodynamic states. As an extension of 
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the original IBI method,!^ the potential is updated 
based on the average differenees in CG and target 
RDFs at multiple states (i.e., a single potential for 
eaeh pair is updated based on RDFs from multiple 
states). The potential is adjusted aeeording to 


Vt+iir)=VM 



as{r)kBTs\n 


.guy. 


, (1) 


where Vi{r) is the pair potential as a funetion of 
separation r at the iteration; N the number of 
target states; a{r) an effeetive weighting faetor for 
eaeh state, allowing more or less emphasis to be 
put on a partieular target state; ks the Boltzmann 
eonstant and T the absolute temperature; the 
RDF from the CG simulation at iteration i and state 
5; g*(r) the target RDF at state 5. CCs{r) was ehosen 
to be a linear funetion of the form 


— oco,i(i fjfcuy (2) 

sueh that a{rcut) = 0, and the potential remains 0 
for r > rcut- This form of a{r) also plaees more 
emphasis on the short-ranged part of the potential 
to suppress long-range struetural artifaets. 

An initial CG potential is assumed for eaeh pair 
interaction. In theory, there are no restrictions on 
the initial potential, so it may take any form; how¬ 
ever, in practice, the initial potential is often the 
potential of mean force (PMF) calculated from the 
Boltzmann inverted RDF. In this work, rather than 
taking an average of the PMFs over the states used, 
the initial potential used was slightly modified from 
the water model of Chiu et al, as discussed below. 

A CG simulation is then run with the initial po¬ 
tential. Based on the RDFs from the CG simulation, 
the potential is updated according to Equation 
The updated potential is used as input to the next 
cycle, and the process is repeated until some stop¬ 
ping criterion is met. Here, the stopping criterion 
is determined using the following fitness function 

/o“'dr|g'(r)-g*(r)| 

where the optimization is stopped when the value 
of ffit exceeds a specified value (i.e., meets some 
tolerance), given below. 


Models 

Atomistic simulations of pure water were per¬ 
formed with the TIP3P model.^ All atomistic sys¬ 
tems contained 5,832 water molecules and were 
simulated in LAMMPSl22B0l using a 1 fs timestep. 
A cutoff distance of 12 A was used for the van der 
Waals interactions; long-range electrostatics were 
handled with the PPPM method with a 12 A real 
space cutoff. Three distinct states were simulated: 
bulk, NVT at 1.0 g/mL and 305 K; bulk, NPT at 
305 K and 1.0 atm; and an NVT droplet state at 
305 K, where the box from the bulk NVT state was 
expanded by a factor of 3 in one direction. Each 
atomistic simulation was run for 7 ns. The atom¬ 
istic trajectories were mapped to the CG level using 
the k-means algorithm. Target RDEs were calcu¬ 
lated from the final 5 ns of the mapped trajectory 
from each state (bulk NVT, bulk NPT, and droplet 
NVT). MS IBI was performed using the target data 
from each of the three states. The initial guess of 
the potential is given as a Morse potential of the 
form 

V{r) = De (^e-2/3(r-r,,) _ j (4) 

where Vgq is the location of the potential minimum, 
—De is the value of the potential minimum, and 
/3 is related to the width of the potential well. 
Parameters are taken to be those from Chiu et 
al: De = 0.813 kcal/mol, j8 = 0.556 A 1, and 
rgq = 6.29 A, however, we note that the poten¬ 
tial was adjusted so that j8 = 0.5 A ^ for r < req. 
This change was made to increase sampling as 
small separations, because numerical issues arise 
in the potential update when the CG RDE is zero 
but the target RDE is nonzero. This modification 
of the potential will slightly alter the properties 
as compared to the original model, as discussed 
below. The potential update scaling factor ao,i 
(see Equations and was set to 0.7 to avoid 
large updates to the potential. The optimizations 
were stopped when ffu > 0.98 for each state and 

All optimizations were performed with the open- 
source MSIBI Python package we developed,^ 
which calls HOOMD-Blue^^lHS! to run the CG sim¬ 
ulations and uses MDTrajl^SEH for RDE calcula¬ 
tions and file-handling. CG simulations were run 
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at the same states as the atomistie systems. Initial 
CG eonfigurations were generated from the CG- 
mapped atomistie trajeetories at eaeh state. As a 
result of the 4:1 mapping, CG water simulations 
eontained 1,458 water beads. All CG simulations 
were run with a 10 fs timestep. The derived CG 
potential was set to 0 beyond the eutoff of 12 A. 

The surfaee tension 7 of the droplet state was 
ealeulated as 


7 = 


2-^Z ( ^zz 



(5) 


where is the length of the box in the expanded 
direetion, is the pressure eomponent in the diree- 
tion normal to the liquid-vapor interfaees, Pxx and 
Pyy are the pressure eomponents in the direetions 
lateral to the interfaees, and the angle braekets de¬ 
note a time average. The faetor of 1/2 is ineluded 
to aeeount for the two interfaees that are present in 
the droplet simulation setup. 


Results and Discussion 

Modified Chiu Potential 

We hrst eonsider the impaet of modifying the Chiu 
et al. potential to ereate a softer repulsion. Figure [T] 
plots the RDFs of the three target states for the 
original and modified potentials and the RDF of 
the 4:1 mapped state (i.e, the target data used later 
for the MS IBI optimization). The peak loeation 
of the NVT state is relatively unehanged; however, 
upon modifieation, there is a slight shift in the first 
peak for the NPT and interfaeial states, allowing 
the model to aeeess smaller separations, as was 
intended. The softer potential allows eloser eontaet 
and thus allows the MS IBI algorithm to modify 
this region of the potential where the 4:1 mapped 
atomistie water has non-zero values of the RDF. 
The density predieted with both potentials is the 
same (0.991 ± 0.003 g/mL), however, due to the 
softening of the potential, the ealeulated surfaee 
tension of the droplet ehanges from 70.3 mN/m 
to 45 mN/m after the modifieation, although we 
note this value is still suffieient for the droplet to 
maintain a stable interfaee. 



Figure 1: RDFs from simulations using the original 
and modified Chiu potentials. Top: NVT; middle: 
NPT; bottom-middle: interfaee; bottom: eompari- 
son of the two potentials. 

Potential Derivation and Validation of 
Bulk Water 

Starting from the modified Morse potential of Chiu 
et al ., the new water foreefield is optimized using 
the bulk NVT and NPT states and the interfaeial 
state. The results of the potential derivation are 
summarized in Figure where it is elear that the 
modified Chiu et al. potentials (i.e., step 0) over¬ 
estimates the struetural eorrelations, as was also 
seen in Figure [T] for both the modified and original 
potentials. After only a few iterations, the RDFs 
mateh the targets with a high degree of aeeuraey. 
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This trend is shown in Figure whieh plots the 
fitness value from Equation as a funetion of itera¬ 
tion. The value of ffit ehanges most rapidly in the 
first 3 steps of the optimization. After 10 iterations, 
the stopping eriteria are met and the optimization 
stopped. While the ehanges to the potential are 
small, there is a notieeable shift in the loeation of 
the minimum to a slightly larger r value and the po¬ 
tential beeomes slightly more attraetive. Although 
the shape of the attraetive well is mostly unehanged, 
the potential more rapidly deeays to 0 than the orig¬ 
inal Morse potential at larger r values, while the 
shape of the repulsive regime at small r is ehanged 
slightly. These subtle ehanges to the potential are 
suffieient to ereate signifieant ehanges in the RDF 
and provide exeellent eonvergenee of the struetural 
eorrelations. Note that in Figure and the RDFs 
from the interfaeial state do not deeay to 1 at large 
r. This is due to the faet that 2/3 of the box is essen¬ 
tially devoid of partieles, but the RDF is normalized 
based on the volume of the whole simulation box. 
This has no effeet on the potential update seheme, 
as both the target and CG RDFs are normalized by 
the same faetor, whieh eaneels out in Equation 
In addition to aeeurately eapturing the RDFs, the 
multi-ensemble approaeh provides an aeeurate es¬ 
timate of the density at 305 K and 1 atm. NPT 
simulations performed using the optimized CG 
foreefield find a density of 1.027 ± 0.006 g/mL, 
eompared to 1.037 ±0.004 g/mF for TIP3P water. 
This approaeh is sueeessful beeause the RDFs will 
not mateh if the pressure-density relationship is not 
satisfied, as the density is implieitly represented in 
Equationthrough the RDF terms (i.e., the RDFs 
at the NPT state will not mateh the target RDFs if 
the density is signifieantly different than the den¬ 
sity of the target state). In eontrast, the original IBI 
method proposed the use of a pressure eorreetion 
term of the form AV{r) = A(1 — rjrcut) to aeeount 
for the pressure.^ This approaeh has been sueeess¬ 
ful, but requires a somewhat arbitrary estimate of 
the parameter A. While a method exists for esti¬ 
mating A based on the virial expression,^ some 
degree of trial-and-error is still neeessary. Further¬ 
more, the multi-ensemble approaeh within MS IBI 
does not require direet ealeulation of the pressure, 
whieh often demonstrates eonsiderable fiuetuations, 
providing a simpler route to aeeount for the pres¬ 
sure in the CG model. 


ai 


ai 


oi 
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Figure 2: RDFs and potentials from the MS IBI 
potential derivation. Top: NVT; middle-top: NPT; 
middle-bottom: interfaee; bottom: potentials. The 
initial potential shows signifieant struetural eorre¬ 
lations missing from the target data. The derived 
potential at ten iterations shows exeellent struetural 
agreement with the target. 


Caleulation of the surfaee tension of the derived 
MS IBI potential yields a value of 42 mN/m, lower 
than the original Chiu et al. potential whieh was 
optimized to mateh experiment, but only slight 
perturbed from the modified potential (45 mN/m). 
This reduetion is surfaee tension appears direetly 
related to the softening of the potential, although, 
we note that this softening is required to provide an 
aeeurate mateh of the strueture. 
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Figure 3: ffu from Equation as a function of 
iteration in the potential derivation. Convergence 
with the criterion is found after 10 iterations. 

Validation and Comparison to Other 
Models 

To further explore the efficacy of the MS IBI- 
derived model, comparisons are made to other CG 
water models in the literature, namely, the fc-means 
based potential of Hadley and McCabd^ derived 
via the single state (SS) IBI procedure (here re¬ 
ferred to as the SS IBI potential) and the MAR¬ 
TINI potential.!^ These models were chosen be¬ 
cause they are short-ranged, non-polarizable, and 
4:1 models. For reference, these potentials are plot¬ 
ted in Figure]^ Note that the MS IBI and SS IBI 
potentials are numerical (as they were derived via 
IBI), while the MARTINI potential is represented 
by a 12-6 Lennard-Jones potential with a well depth 
of 1.195 kcaFmol located at a separation of 5.276 A. 
Note that all of the potentials considered in this pa¬ 
per provide a close estimate of the density of water 
at 1 atm and 305 K, as reported in Table 

Table 1: Density of CG water at 305 K, 1 atm 
with various models. 


Model 

p, g/mL 

TIP3P 

1.037 ±0.004 

MS IBI 

1.027 ±0.006 

SSIBI 

1.083 ±0.008 

MARTINI 

1.015 ±0.003 

Chiu 

0.991 ± 0.003 


First considering the SS IBI potential, it can 
be seen that the well depth is approximately 
0.5 kcaFmol weaker than the MS IBI potential and 



Figure 4: Interaction potentials from the CG water 
models compared in this work. The MS IBI and SS 
IBI potentials are numerical, derived with structure- 
based methods. MARTINI is a Lennard-Jones 12-6 
potential. 

shifted to larger separations. While this has little 
impact on the density or the structural correlations 
of NVT and NPT states (not shown), given the po¬ 
tential was optimized to match these RDFs, simula¬ 
tions of droplets show that the interfacial properties 
are not sufficiently captured. Specihcally, as shown 
in Figuresimulations of atomistic TIP3P, SS IBI, 
and MS IBI water were performed with interfaces. 
From these it can be clearly seen that the SS IBI 
potential model fills the box, rather than maintain¬ 
ing an interface. In contrast, the MS IBI model 
maintains a stable interface in agreement with the 
atomistic model. Thus, while an exact match to 
the experimental surface tension is not found for 
the MS IBI potential, it is still sufficiently strong 
enough to maintain a clear interface, providing a 
significant improvement over the SS IBI potential. 

It is also important that the potential is not so 
strong that the system can solidify at physiologi¬ 
cal conditions. For example, the MARTINI wa¬ 
ter model is known to spontaneously crystallize 
at physiologically relevant temperaturesThis 
phenomenon is enhanced by the presence of inter¬ 
faces (e.g., a lipid bilayer surface), and requires 
the addition of unphysical “antifreeze” particles to 
avoid crystallization. While we note that modifi¬ 
cations to the MARTINI w ater m odel exist (e.g., 
adding charge polarization) only the original 
MARTINI model was tested, since it more closely 
resembles the model derived via MS IBI (i.e., both 
represent 4 water molecules as a single, spherically 
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Figure 5: Simulation snapshots of droplets using 
the various models diseussed. Top: all-atom TIP3P; 
middle: SS IBI; bottom: MS IBI. Atomistie and 
MS IBI models agree, produeing a system with a 
stable interfaee, whereas SS IBI does not form a 
stable interfaee. 

symmetrie interaetion site). To test the erystalliza- 
tion tendeney, a nueleation site is generated with 
the following protoeol. A erystalline state is gen¬ 
erated by running a simulation with the MS IBI 
potential in the NVT ensemble. During this sim¬ 
ulation, the temperature is deereased from 305 K 
to 1 K over 10 ns. A subsequent CG simulation is 
run at 1000 K, where the middle-most l/8th of the 
beads are kept fixed, resulting in a eonfiguration 
that eontains a erystal seed surrounded by a fluid 
of CG water beads. The beads in the erystal seed 
are kept fixed in the nueleation site simulations, 
with interaetions identieal to the fluid interaetions. 
While neither models shows a tendeney to freeze at 
305 K in the absenee of a nueleation site over 100 
ns of simulation, the MARTINI model rapidly crys¬ 
tallizes in the presence of a nueleation site, while 
the MS IBI potentials remains fluid Q. Note, for 
a direct comparison with the MS IBI model de¬ 
rived here, antifreeze particles were not used with 
the MARTINI model. To ensure that the MS IBI 
system is not an amorphous solid structure, the ra¬ 
tio of the diffusion coefficients with and without 
a nueleation site were calculated for each model. 
As shown in Table the diffusion coefficient of 


the MS IBI potential model remains relatively un¬ 
changed when a nueleation site is added, whereas 
a significant drop is seen for the MARTINI model, 
resulting from crystallization. Additionally, Fig¬ 
ure |7] plots the RDF of the MARTINI model for 
the bulk NVT state as compared to the 4:1 mapped 
target data. Clearly, the MARTINI potential does 
not accurately capture the structural correlations 
of bulk water, further demonstrating the significant 
improvement of the MS IBI model in reproducing 
key properties of water. 


Figure 6: Configurations from simulations in the 
presence of a nueleation site with the MARTINI 
(left) and MS IBI (right) models. CG water beads 
colored silver were kept fixed during the simula¬ 
tions, but were treated as the same type as blue 
particles (i.e., the color is different to show the 
nueleation site). 

Table 2: Ratio of diffusion coefficients from sim¬ 
ulations with {Dnuc) and without {Dt,uik) a nucle- 
ation site with different potentials. Diffusion co¬ 
efficient D calculated from the slope of a linear 
fit to the long-time MSD, using MSD = 6Dt. 


Model 

Dnuc/ 

MS IBI 

0.88 

MARTINI 

0.02 


Conclusions 

In this work, the MS IBI method was used to derive 
the interactions for a 4:1 mapped CG water model. 
An improvement over previous models is made by 
simultaneously matching the fluid structure to tar¬ 
get data from bulk and interfacial states. It was 
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Figure 7: RDFs of the MARTINI model and the 
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